Machine Translation Quality Estimation from Memsource MTQE

Machine translation (MT) is making incredible progress. Neural machine translation is leading the way and achieving new heights in terms of quality and fluency. But it’s taking longer than expected for organizations to start using MT. One key reason for this is that despite MT innovations, MT quality is still unpredictable. Being able to estimate MT quality appears to be the missing link in large-scale MT adoption.

Last year, the Memsource AI team set out to address this problem.

The Idea

The Memsource in-house AI team was created with the aim of using machine learning to solve problems that localization professionals face every day.

“From the beginning, we looked for areas where machine learning could help our users work more effectively, and given our users’ challenges around machine translation, it seemed like a great place to start” said Ales Tamchyna, Head of Memsource’s AI team.

When using MT, you don’t know whether the output will be high quality or the complete opposite. Without this information, it can be hard to easily and quickly determine whether machine translation is right for a project and how much editing is required.

“I think what helped us to decide to pursue a project around MT quality estimation was seeing that high-quality MT really helps speed up professional translation,” said Tamchyna.

The data in the graph below shows that the better the MT, the faster it is for a linguist to produce a final translation. “It’s a nice, nearly linear trend,” added Tamchyna. “You see gains already when MT quality is around 60. We realized, if we could supply this quality information from the start of a translation, we could not only help make MT post-editing faster, but also make project quotes more accurate”.

The AI team had data from historical translations where MT post-editing was used. The next step was to turn it into something meaningful for users.

The development process involved a lot of back-and-forth, including many exploratory tasks and preliminary experiments, but there were three main stages:

Establishing the rough neural network architecture and the training process, including data processes
Experimenting with different settings of MTQE categories and tuning the networks in various ways. The AI team were interested not only in the best possible performance, but also in computational efficiency. In the end, they obtained quite an efficient model without sacrificing predictive power.
Turning a prototype into a production-ready system and training the final version of models for all supported language pairs.

The Solution: Machine Translation Quality Estimation (MTQE)

All the development and training culminated in the beta version of Memsource’s machine translation quality estimation feature.

Given that this was such a unique feature, the first version involved some trial and error. “MTQE version 1 focused on perfect and near-perfect machine translation output. Machine translation provides the best results for shorter segments, so, as we heard from the early MTQE adopters, the majority of the MTQE results were just for short segments,” said Tamchyna.

While there was a lot of positive feedback, as our MTQE feedback webinar shows, it became clear from those who tried MTQE that the coverage of the scores was too limited. So, the AI team decided to find a way to ensure that the scores covered more content.

“The second version of MTQE is based on redesigned AI models and a new neural network architecture,” said Tamchyna. “We evaluated many approaches and ran probably hundreds of experiments. Ultimately, we wanted to make MTQE much more robust”.

Initial testing of the second version of MTQE indicates that for certain language pairs, quality scores are available in up to 60% of segments, four times more than in version 1, which could equate to up to 10% savings on post-editing costs.

The latest version of the MTQE feature can be used with any of the 30+ MT engines supported in Memsource and with 92 different language pairs. Currently, MTQE version 2 is still in beta as we are continuing to test the feature and gather user feedback.

These are the scoring categories:

100% MT
Perfect MT output, probably no post-editing required

99% MT
Near-perfect MT output, possibly minor post-editing required for mostly formatting or punctuation issues

75% MT
High quality MT output, and worth post-editing

No score
When there is no score, it is very likely that the MT output is of low quality. In general, it is recommended that this output not be post-edited but used for reference only.

Find out how to set up and use MTQE.

The graph below shows the coverage of the MTQE scores for the top eight language pairs used in Memsource.

“Now with MTQE data available from the beginning, users can decide whether it makes sense to use MT, what the potential post-editing savings could be, and whether it’s faster to post-edit the MT or translate from scratch, ” added Tamchyna.

The Future

For the Memsource AI team, MTQE is just the beginning of what’s possible when it comes to combining machine translation and AI, and they have a lot more ideas in the pipeline.

“Overall, we expect the use of MT engines to become much more streamlined,” said Tamchyna. “The translation platform should automatically select the most suitable engine(s) for the given content and once the machine translation is finished, it should estimate the quality of the translation. I think we will also see greater integration of machine translation with custom resources, such as translation memories or term bases. There are a lot of exciting features to look forward to – some coming very soon. ”